217 research outputs found

    All biology is computational biology

    Here, I argue that computational thinking and techniques are so central to the quest of understanding life that today all biology is computational biology. Computational biology brings order into our understanding of life, it makes biological concepts rigorous and testable, and it provides a reference map that holds together individual insights. The next modern synthesis in biology will be driven by mathematical, statistical, and computational methods being absorbed into mainstream biological training, turning biology into a quantitative science.Cancer Research UK (grant number C14303/A17197)

    Patterns of Immune Infiltration in Breast Cancer and Their Clinical Implications: A Gene-Expression-Based Retrospective Study

    Background\textbf{Background}: Immune infiltration of breast tumours is associated with clinical outcome. However, past work has not accounted for the diversity of functionally distinct cell types that make up the immune response. The aim of this study was to determine whether differences in the cellular composition of the immune infiltrate in breast tumours influence survival and treatment response, and whether these effects differ by molecular subtype. Methods and Findings\textbf{Methods and Findings}: We applied an established computational approach (CIBERSORT) to bulk gene expression profiles of almost 11,000 tumours to infer the proportions of 22 subsets of immune cells. We investigated associations between each cell type and survival and response to chemotherapy, modelling cellular proportions as quartiles. We found that tumours with little or no immune infiltration were associated with different survival patterns according to oestrogen receptor (ER) status. In ER-negative disease, tumours lacking immune infiltration were associated with the poorest prognosis, whereas in ER-positive disease, they were associated with intermediate prognosis. Of the cell subsets investigated, T regulatory cells and M0 and M2 macrophages emerged as the most strongly associated with poor outcome, regardless of ER status. Among ER-negative tumours, CD8+ T cells (hazard ratio [HR] = 0.89, 95% CI 0.80-0.98; pp = 0.02) and activated memory T cells (HR 0.88, 95% CI 0.80-0.97; pp = 0.01) were associated with favourable outcome. T follicular helper cells (odds ratio [OR] = 1.34, 95% CI 1.14-1.57; pp < 0.001) and memory B cells (OR = 1.18, 95% CI 1.0-1.39; pp = 0.04) were associated with pathological complete response to neoadjuvant chemotherapy in ER-negative disease, suggesting a role for humoral immunity in mediating response to cytotoxic therapy. Unsupervised clustering analysis using immune cell proportions revealed eight subgroups of tumours, largely defined by the balance between M0, M1, and M2 macrophages, with distinct survival patterns by ER status and associations with patient age at diagnosis. The main limitations of this study are the use of diverse platforms for measuring gene expression, including some not previously used with CIBERSORT, and the combined analysis of different forms of follow-up across studies. Conclusions\textbf{Conclusions}: Large differences in the cellular composition of the immune infiltrate in breast tumours appear to exist, and these differences are likely to be important determinants of both prognosis and response to treatment. In particular, macrophages emerge as a possible target for novel therapies. Detailed analysis of the cellular immune response in tumours has the potential to enhance clinical prediction and to identify candidates for immunotherapy.HRA is an NIHR Academic Clinical Lecturer and was a recipient of a Career Development Fellowship from The Pathological Society of GB and N Ireland, and a Starter Grant for Clinical Lecturers from the Academy of Medical Sciences. LC, CC, and FM received funding from the CRUK & EPSRC Cancer Imaging Centre in Cambridge & Manchester (grant C197/A16465)

    Reconstructing evolving signalling networks by hidden Markov nested effects models

    Inferring time-varying networks is important to understand the development and evolution of interactions over time. However, the vast majority of currently used models assume direct measurements of node states, which are often difficult to obtain, especially in fields like cell biology, where perturbation experiments often only provide indirect information of network structure. Here we propose hidden Markov nested effects models (HM-NEMs) to model the evolving network by a Markov chain on a state space of signalling networks, which are derived from nested effects models (NEMs) of indirect perturbation data. To infer the hidden network evolution and unknown parameter, a Gibbs sampler is developed, in which sampling network structure is facilitated by a novel structural Metropolis–Hastings algorithm. We demonstrate the potential of HM-NEMs by simulations on synthetic time-series perturbation data. We also show the applicability of HM-NEMs in two real biological case studies, in one capturing dynamic crosstalk during the progression of neutrophil polarisation, and in the other inferring an evolving network underlying early differentiation of mouse embryonic stem cells.This is the final published manuscript, originally published by The Annals of Applied Statistics here: http://projecteuclid.org/euclid.aoas/1396966294

    HTSanalyzeR: an R/Bioconductor package for integrated network analysis of high-throughput screens

    Motivation: High-throughput screens (HTS) by RNAi or small molecules are among the most promising tools in functional genomics. They enable researchers to observe detailed reactions to experimental perturbations on a genome-wide scale. While there is a core set of computational approaches used in many publications to analyze these data, a specialized software combining them and making them easily accessible has so far been missing

    Refining cellular pathway models using an ensemble of heterogeneous data sources

    © Institute of Mathematical Statistics, 2018. Improving current models and hypotheses of cellular pathways is one of the major challenges of systems biology and functional genomics. There is a need for methods to build on established expert knowledge and reconcile it with results of new high-throughput studies. Moreover, the available sources of data are heterogeneous, and the data need to be integrated in different ways depending on which part of the pathway they are most informative for. In this paper, we introduce a compartment specific strategy to integrate edge, node and path data for refining a given network hypothesis. To carry out inference, we use a local-move Gibbs sampler for updating the pathway hypothesis from a compendium of heterogeneous data sources, and a new network regression idea for integrating protein attributes. We demonstrate the utility of this approach in a case study of the pheromone response MAPK pathway in the yeast S. cerevisiae.This work was supported, in part, by NIH grant R01 GM-096193, NSF CAREER grant IIS-1149662, and by MURI award W911NF-11-1-0036 to Harvard University. EMA is an Alfred P. Sloan Research Fellow and a Shutzer Fellow at the Radcliffe Institute for Advanced Studies. FM acknowledges support from the University of Cambridge, Cancer Research UK (C14303/A17197), and Hutchison Whampoa Limited. FM and EMA contributed equally to this work

    Phylogenetic quantification of intra-tumour heterogeneity.

    Intra-tumour genetic heterogeneity is the result of ongoing evolutionary change within each cancer. The expansion of genetically distinct sub-clonal populations may explain the emergence of drug resistance, and if so, would have prognostic and predictive utility. However, methods for objectively quantifying tumour heterogeneity have been missing and are particularly difficult to establish in cancers where predominant copy number variation prevents accurate phylogenetic reconstruction owing to horizontal dependencies caused by long and cascading genomic rearrangements. To address these challenges, we present MEDICC, a method for phylogenetic reconstruction and heterogeneity quantification based on a Minimum Event Distance for Intra-tumour Copy-number Comparisons. Using a transducer-based pairwise comparison function, we determine optimal phasing of major and minor alleles, as well as evolutionary distances between samples, and are able to reconstruct ancestral genomes. Rigorous simulations and an extensive clinical study show the power of our method, which outperforms state-of-the-art competitors in reconstruction accuracy, and additionally allows unbiased numerical quantification of tumour heterogeneity. Accurate quantification and evolutionary inference are essential to understand the functional consequences of tumour heterogeneity. The MEDICC algorithms are independent of the experimental techniques used and are applicable to both next-generation sequencing and array CGH data.This is the final published version. It was originally published by PLoS in PLoS Computational Biology here: http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003535

    A phylogenetic latent feature model for clonal deconvolution

    Tumours develop in an evolutionary process, in which the accumulation of mutations produces subpopulations of cells with distinct mutational profiles, called clones. This process leads to the genetic heterogeneity widely observed in tumour sequencing data, but identifying the genotypes and frequencies of the different clones is still a major challenge. Here, we present Cloe, a phylogenetic latent feature model to deconvolute tumour sequencing data into a set of related genotypes. Our approach extends latent feature models by placing the features as nodes in a latent tree. The resulting model can capture both the acquisition and the loss of mutations, as well as episodes of convergent evolution. We establish the validity of Cloe on synthetic data and assess its performance on controlled biological data, comparing our reconstructions to those of several published state-of-the-art methods. We show that our method provides highly accurate reconstructions and identifies the number of clones, their genotypes and frequencies even at a modest sequencing depth. As a proof of concept, we apply our model to clinical data from three cases with chronic lymphocytic leukaemia and one case with acute myeloid leukaemia.CRUK (Core grant C14303/A17197, A20240 (Rosenfeld lab core grant), A19274 (Markowetz lab core grant)), University of Cambridge, Hutchison Whampoa Limite

    Master Regulators of Oncogenic KRAS Response in Pancreatic Cancer: An Integrative Network Biology Analysis.

    BACKGROUND: KRAS is the most frequently mutated gene in pancreatic ductal adenocarcinoma (PDAC), but the mechanisms underlying the transcriptional response to oncogenic KRAS are still not fully understood. We aimed to uncover transcription factors that regulate the transcriptional response of oncogenic KRAS in pancreatic cancer and to understand their clinical relevance. METHODS AND FINDINGS: We applied a well-established network biology approach (master regulator analysis) to combine a transcriptional signature for oncogenic KRAS derived from a murine isogenic cell line with a coexpression network derived by integrating 560 human pancreatic cancer cases across seven studies. The datasets included the ICGC cohort (n = 242), the TCGA cohort (n = 178), and five smaller studies (n = 17, 25, 26, 36, and 36). 55 transcription factors were coexpressed with a significant number of genes in the transcriptional signature (gene set enrichment analysis [GSEA] p < 0.01). Community detection in the coexpression network identified 27 of the 55 transcription factors contributing to three major biological processes: Notch pathway, down-regulated Hedgehog/Wnt pathway, and cell cycle. The activities of these processes define three distinct subtypes of PDAC, which demonstrate differences in survival and mutational load as well as stromal and immune cell composition. The Hedgehog subgroup showed worst survival (hazard ratio 1.73, 95% CI 1.1 to 2.72, coxPH test p = 0.018) and the Notch subgroup the best (hazard ratio 0.62, 95% CI 0.42 to 0.93, coxPH test p = 0.019). The cell cycle subtype showed highest mutational burden (ANOVA p < 0.01) and the smallest amount of stromal admixture (ANOVA p < 2.2e-16). This study is limited by the information provided in published datasets, not all of which provide mutational profiles, survival data, or the specifics of treatment history. CONCLUSIONS: Our results characterize the regulatory mechanisms underlying the transcriptional response to oncogenic KRAS and provide a framework to develop strategies for specific subtypes of this disease using current therapeutics and by identifying targets for new groups.IdS and FM were funded by Cancer Research UK core grant C14303/A17197 and A19274 (to FM). LC was supported by the Cancer Research UK and Engineering and Physical Sciences Research Council Imaging Centre in Cambridge and Manchester (C197/A16465)

    Inferring signalling networks from longitudinal data using sampling based approaches in the R-package 'ddepn'

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Network inference from high-throughput data has become an important means of current analysis of biological systems. For instance, in cancer research, the functional relationships of cancer related proteins, summarised into signalling networks are of central interest for the identification of pathways that influence tumour development. Cancer cell lines can be used as model systems to study the cellular response to drug treatments in a time-resolved way. Based on these kind of data, modelling approaches for the signalling relationships are needed, that allow to generate hypotheses on potential interference points in the networks.</p> <p>Results</p> <p>We present the R-package 'ddepn' that implements our recent approach on network reconstruction from longitudinal data generated after external perturbation of network components. We extend our approach by two novel methods: a Markov Chain Monte Carlo method for sampling network structures with two edge types (activation and inhibition) and an extension of a prior model that penalises deviances from a given reference network while incorporating these two types of edges. Further, as alternative prior we include a model that learns signalling networks with the scale-free property.</p> <p>Conclusions</p> <p>The package 'ddepn' is freely available on R-Forge and CRAN <url>http://ddepn.r-forge.r-project.org</url>, <url>http://cran.r-project.org</url>. It allows to conveniently perform network inference from longitudinal high-throughput data using two different sampling based network structure search algorithms.</p